Integrating information theory and adversarial learning for cross-modal retrieval
نویسندگان
چکیده
Accurately matching visual and textual data in cross-modal retrieval has been widely studied the multimedia community. To address these challenges posited by heterogeneity gap semantic gap, we propose integrating Shannon information theory adversarial learning. In terms of integrate modality classification entropy maximization adversarially. For this purpose, a classifier (as discriminator) is built to distinguish text image modalities according their different statistical properties. This discriminator uses its output probabilities compute entropy, which measures uncertainty it performs. Moreover, feature encoders generator) project uni-modal features into commonly shared space attempt fool maximizing entropy. Thus, gradually reduces distribution discrepancy features, thereby achieving domain confusion state where cannot classify two confidently. reduce Kullback-Leibler (KL) divergence bi-directional triplet loss are used associate intra- inter-modality similarity between space. Furthermore, regularization term based on KL-divergence with temperature scaling calibrate biased label caused imbalance issue. Extensive experiments four deep models benchmarks conducted demonstrate effectiveness proposed approach.
منابع مشابه
MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval
Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data (such as text, image, video, audio and 3D model). However, existing methods based on deep neural network (DNN) often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relievin...
متن کاملCross-Modal Manifold Learning for Cross-modal Retrieval
This paper presents a new scalable algorithm for cross-modal similarity preserving retrieval in a learnt manifold space. Unlike existing approaches that compromise between preserving global and local geometries, the proposed technique respects both simultaneously during manifold alignment. The global topologies are maintained by recovering underlying mapping functions in the joint manifold spac...
متن کاملHashGAN: Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval
As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogenei...
متن کاملSelf-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how to bridge the modality gap to further enhance the retrieval accuracy. In this paper, we propose a self-supervised adversarial hashing (SSAH) approach, which lies among the early attempts to incorporate adversarial learning into cross-modal ...
متن کاملIntegrating Textual and Visual Information for Cross-Language Image Retrieval
This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual representations is proposed. The relationships between text and images are mined. We employ the mined relationships to construct visual queries from textual ones. The retrieval results of textual and visual queries are c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition
سال: 2021
ISSN: ['1873-5142', '0031-3203']
DOI: https://doi.org/10.1016/j.patcog.2021.107983